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We compute the Bayesian evidences for one- and two-parameter models of evolving dark energy, 
and compare them to the evidence for a cosmological constant, using current data from Type 
la supernova, baryon acoustic oscillations, and the cosmic microwave background. We use only 
distance information, ignoring dark energy perturbations. We find that, under various priors on 
the dark energy parameters, ACDM is currently favoured as compared to the dark energy models. 
We consider the parameter constraints that arise under Bayesian model averaging, and discuss the 
implication of our results for future dark energy projects seeking to detect dark energy evolution. The 
model selection approach complements and extends the figure-of-merit approach of the Dark Energy 
Task Force in assessing future experiments, and suggests a significantly-modified interpretation of 
that statistic. 



PACS numbers: 98.80.-k 



I. INTRODUCTION 



A key challenge for cosmology is to uncover the nature 
of the force which is causing the Universe to expand at an 
accelerating rate today. The cause, dubbed dark energy, 
could be an unknown energy component with negative 
pressure [l|, a modification of general relativity Q, or 
simply a cosmological constant. For reviews on the sub- 
ject, see for example Ref. @. 

There are many planned and proposed dark energy ex- 
periments that aim to constrain dark energy parameters, 
using a combination of complementary techniques. These 
include the luminosity distance-redshift relation of Type 
la supernovae (SNe la), the angular-diameter distance- 
redshift and expansion rate-redshift relations measured 
by baryon acoustic oscillations (BAO), and use of weak 
gravitational lensing to probe the growth rate of struc- 
tures. The cosmic microwave background (CMB) also 
provides a very useful handle on dark energy by pin- 
ning down the distance to the last-scattering surface, and 
also via the Integrated Sachs- Wolfe effect and by detect- 
ing clusters through the Sunyaev-Zel'dovich effect. Ap- 
proaches to constraining dark energy were overviewed in 
the recent report of the DoE/NASA/NSF Dark Energy 
Task Force (DETF) Q|. 

A primary aim of future experiments is to distin- 
guish evolving dark energy from a cosmological constant. 
When seeking to compare models, especially with dif- 
ferent numbers of variable parameters, one should use 
the concepts of model selection rather than those of pa- 
rameter estimation (e.g. Refs. 0, @). Model selection 
quantifies how well the data conform to the overall pre- 
dictions of a model, which depends on model dimension- 
ality and model priors. In addressing the primary goal, a 
satisfactory representation of many evolving dark energy 
models turns out to be an unknown energy component 
with equation of state w(a) — wq + w a (l — a), where 
a is the scale factor. Simpler alternatives may be the 
constant w model with negative pressure, and the cos- 
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mological constant model with fixed w = — 1. This is a 
natural area for the application of model selection statis- 
tics 0, H| , which we take up in this paper. Here we update 
and extend work by Saini et al. [?|, who were the first to 
apply Bayesian model selection to dark energy models. 
For alternative views on determining the number of dark 
energy parameters, see Ref. fToj | . 

We do not consider growth-of-structure constraints, 
which ultimately will be required to distinguish between 
dark energy and modified gravity models for the acceler- 
ation [U, [ll[ . In the phenomenological approach adopted 
here, the dynamical evolution of w could be attributed to 
either phenomenon. At present, the structure formation 
growth factor theory is known only for specific modified 
gravity models, and further development is needed before 
such models can be usefully considered in the model se- 
lection framework. In any case, at the present time these 
observations are not competitive with the ones we use. 

In this paper we compute the Bayesian evidence for 
evolving dark energy versus that of a cosmological con- 
stant given current distance measurements from CMB, 
SN la, and BAO data, ignoring dark energy perturba- 
tions. In light of this result, we discuss the probability 
that future experiments will detect evolving dark energy, 
and the implications of this in assessing the capabilities 
of future experiments. 



II. METHOD AND MODELS 

Bayesian model selection extends the usual parameter 
estimation framework by assigning probabilities to sets of 
parameters, known as models, as well as the usual prob- 
ability distributions of parameter values for each specific 
choice of model. The key statistic of Bayesian model 
selection is the Bayesian evidence E, being the average 
likelihood of the model over its prior parameter ranges 
[13 . [l3j . This quantity updates the prior model proba- 
bility to the posterior model probability, enabling one to 
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compare different models according to their probability. 

The use of the Bayesian evidence has lagged behind 
parameter estimation techniques in the cosmology liter- 
ature because of the difficulty in computing the required 
integral to high accuracy, so as to be able to distinguish 
between the models of interest. The nested sampling al- 
gorithm, proposed by Skilling flij ] and implemented for 
cosmological applications by some of us in Ref. 0, has 
proven to be computationally efficient and accurate. It is 
a simple algorithm and more general than thermal meth- 
ods as used in Refs. [HI, [l6|. For instance, nested sam- 
pling can handle multiphase problems, in which InL is 
not a concave function of InV, where X is the cumula- 
tive probability mass within isolikelihood surfaces and L 
the likelihoods of the surfaces — thermal methods fail on 
such problems (John Skilling, private communication). 

We use the nested sampling algorithm to compute the 
Bayesian evidences @, Q G3- Om code, called Cos- 
moNest, is available at the URL www.cosmonest.org. As 
compared to the public version, it was modified so that 
instead of using the power spectra for each model, it used 
the data and likelihoods described in the next section. As 
we are not computing power spectra the calculation pro- 
ceeds very swiftly, taking just a few minutes to obtain 
multiple estimates of the evidence of a model. The esti- 
mates can then be combined into a mean evidence and 
an error on that mean. 

We consider five different models in all, correspond- 
ing to different parametrizations of the equation of state 
w and/or different parameter priors. The basic mod- 
els are ACDM (w = —1, Model I), a one-parameter 
model with constant w, and a two-parameter model 
w(a) = wq + w a (l — a), where wq and w a are constants. 
This last par ametrization, introduced by Chevallier and 
Polarski [181 ] , is a good approximation to many dark en- 
ergy models, while the constant w model is purely phe- 
nomenological. In addition to the equation of state, each 
model requires two further parameters to complete its 
specification, the matter density O m and the Hubble con- 
stant Hq. 

For the latter two parametrizations, we make two sep- 
arate choices of prior in order to explore this dependence. 
For the constant w case these are — 1 < w < —0.33 
(Model II) and -2 < w < -0.33 (Model III), the for- 
mer enforcing the weak energy condition and the lat- 
ter allowing phantom models. For the two-parameter 
model, Model IV has flat priors of — 2 < wo < —0.33, 
— 1.33 < w a < 1.33 (the prior on w a being particularly ar- 
bitrary), while Model V corresponds to the quintessence 
prior of — 1 < w(a) < 1 imposed between z — and 2. 



III. OBSERVATIONAL DATA 



(WMAP) observations [20, [2JJ, of R = 1.70 ± 0.03 [lj, 
which is mostly independent of assumptions made about 
dark energy. The shift parameter R is [HI 
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In that case R 
termined as both fl m h 2 and r(zcMB) are accurately mea- 
sured by CMB data. 

We use the BAO measurement from the Sloan Digital 
Sky Survey (SPSS ) lu minous red galaxies, dy(0.35) = 
1.300 ± 0.088Gpc [23], obtained from power spectrum 
estimates and consistent with the result of Ref. [H| ob- 
tained using the estimated correlation function. Here the 
distance parameter is 



dy(zBAo) — 



r 2 {zBAo) 



czbao 



H(ZBAO) 



1/3 



(4) 



where r(z) is the comoving distance, and H(z) is the 
Hubble parameter. For the SDSS luminous red galaxies, 
the mean survey redshift is zbao = 0.35. 1 

We use SN la data from the HST/GOODS programme 
[H[ (Riess04) and the first year Supernova Legacy Sur- 
vey [26[ (Astier05), together with nearby SN la data. 
The comparison of results from these two SN la datasets 
provides a consistency check. We do not combine the 
two SN la datasets, as they have systematic differences 
in data processing; see the discussion in Ref. [191 ]. 

We use the Riess04 'gold' sample flux-averaged with 
Az = 0.05. This sample includes 9 SNe la at z > 1, 
and appears to have systematic effects from weak lensing, 
or another effect that mimics weak lensing qualitatively. 
This would bias the distance estimates somewhat without 
flux averaging (27], HI] , and so we use it on these SNe [2!| • 

We have also added a conservative estimate of the in- 
trinsic dispersion of SN la peak brightness, 0.15 mag, in 
quadrature with the distance moduli of Astier05, rather 
than the smaller intrinsic dispersion derived by them by 
requiring a reduced x 2 = 1 in their model fitting. This is 



We use data in a manner very similar to Wang and 
Mukherjee [l9| . which can be consulted for more details. 

We use the CMB shift parameter measured by 
the three-year Wilkinson Microwave Anisotropy Probe 



1 The SDSS BAO result has been computed for a scalar spectral 
index value of ng = 0.98, and should be scaled by (ng/0.98) — 35 
2l for a different 'best-fit' n s . For n s ~ 0.95 following WMAP3 
21 1 this is an insignificant factor, which however we do include. 
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TABLE I: The mean A In E relative to the ACDM model together with its uncertainty, the information content H , the minimum 
X 2 , and the parameter constraints, for each of the models considered and for each of two data combinations. Uncertainties 
on Ho are statistical only, and do not include systematic uncertainties. The models differ by virtue of the number of free 
parameters, here in the dark energy sector, and/or the priors on those parameters. For reference, \nE for the ACDM model 
was found to be —20.1 ± 0.1 for the compilation with Riess04 and —52.3 ± 0.1 for that with Astier05. 



data used 


Model 


WMAP+SDSS+ 


A In E H Xmin parameter constraints 




Model I: A 


Riess04 
Astier05 


0.0 5.7 30.5 Sl m = 0.26 ± 0.03, Ho = 65.5 ± 1.0 
0.0 6.5 94.5 n m =0.25 ±0.03, H = 70.3 ±1.0 




Model II: constant w, flat prior —1 < w < —0.33 


Riess04 
Astier05 


-0.1 ±0.1 6.4 28.6 n m = 0.27 ± 0.04, H = 64.0 ± 1.4, w < -0.81, -0.70" 
-1.3 ±0.1 8.0 93.3 n m = 0.24 ± 0.03, H = 69.8 ± 1.0, w< -0.90, -0.83" 




Model III: constant w, flat prior —2 < w < —0.33 


Riess04 
Astier05 


-1.0 ±0.1 7.3 28.6 f2 m = 0.27 ±0.04, H = 64.0 ± 1.5, w = -0.87 ±0.1 
-1.8 ±0.1 8.2 93.3 fi m = 0.25 ± 0.03, H = 70.0 ± 1.0, w = -0.96 ± 0.08 




Model IV: w -w a , flat prior -2 < w < -0.33, -1.33 < w a < 1.33 


Riess04 
Astier05 


-1.1 ±0.1 7.2 28.5 O m = 0.27 ±0.04, Ho = 64.1 ±1.5, w = -0.83 ±0.20, w a = — 6 
-2.0 ±0.1 8.2 93.3 Q m = 0.25 ± 0.03, H = 70.0 ± 1.0, w = -0.97 ± 0.18, w a = — 6 




Model V: w -w a , -1 < w(a) < 1 for < z < 2 


Riess04 
Astier05 


-2.4 ± 0.1 9.1 28.5 On = 0.28 ± 0.04, H = 63.6 ± 1.3, w < -0.78, -0.60", w a = -0.07 ± 0.34 
-4.1 ±0.1 11.1 93.3 fi m = 0.24 ± 0.03, H = 69.5 ± 1.0, w < -0.90, -0.80°, w a = 0.12 ± 0.22 



'Where constraints on w are shown as upper limits only, the values 
are 68% and 95% marginalized confidence limits. 
c w a is unconstrained in Model IV. 



because the intrinsic dispersion in SN la peak brightness 
should be derived from the distribution of nearby SNe 
la, or SNe la from the same small redshift interval if the 
distribution in the peak brightness evolves with cosmic 
time. This distribution is not well known at present, but 
will become better known as more SNe la are observed by 
the nearby SN la factory [3(| • By using the larger intrin- 
sic dispersion, we allow some reasonable margin for the 
uncertainties in the SN la peak brightness distribution. 



IV. RESULTS 

We calculate the Bayesian evidence as our primary 
model selection statistic. We also calculate the informa- 
tion content H of the datasets, the best-fit x 2 values, and 
the posterior parameter distributions within each model. 
Our main focus is on the evidence and the parameter dis- 
tributions. All of these quantities are by-products of run- 
ning CosmoNest to evaluate the evidence of a model [13] • 



A. Bayesian evidence E 

The interpretational scale introduced by Jeffreys [3l| 
defines a difference in In E of greater than 1 as significant, 



greater than 2.5 as strong, and greater than 5 as decisive, 
evidence in favour of the model with greater evidence. 

Our results are summarized in Table HI The priors on 
the equation of state parameters were given earlier and 
are indicated in the table. Priors on the additional pa- 
rameters are 0.1 < fi m < 0.5 and 40 < Hq < 90. For each 
model and data compilation we tabulate A InE, which 
is the difference between the mean \nE of the ACDM 
model and the model concerned, plus the error on that 
difference, obtained from 8 estimates of the evidence of 
each model. Thus the ACDM entry is zero by definition. 

We find that the WMAP±SDSS(BAO)±Astier05 data 
combination distinguishes amongst the models more 
strongly than does WMAP±SDSS(BAO)±Riess04 data, 
while showing the same general trends. Subsequently, 
our discussion uses Astier05 throughout. 

Overall, the ACDM model (Model I) is a simple model 
that continues to give a good fit to the data. It is there- 
fore rewarded for its predictiveness with the largest evi- 
dence, and remains the favoured model as found with an 
earlier dataset (of SNe alone) by Saini et al. Q . The other 
models all show smaller evidences, though none are yet 
decisively ruled out. Nevertheless, there is distinct evi- 
dence against the two-parameter models, especially from 
the compilation including Astier05. Model V has a wider 
parameter range than Model IV and fares the worst, re- 
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FIG. 1: 68% and 95% confidence constraints on wq and w a for model IV (left plot) and model V (right plot). Note the axis 
ranges are different, to show the full prior ranges in each case. Model IV corresponds to a flat prior on the parameters over the 
range plotted, and Model V corresponds to a quintessence prior which amounts to a flat prior within the region shown by dotted 
lines (the contours go just a little out of that region due to the effect of binning the likelihoods of the obtained samples on a grid). 
The solid contours are for WMAP+SDSS(BAO)+Astier05, and dot-dashed contours are for WMAP+SDSS(BAO)+Riess04. 



ceiving a large penalty for its lack of predictiveness of 
the data. The one-parameter models lie somewhere in 
between. 

We interpret these results in the following section. 



B. The information H 

The information content of the data H is defined as mi- 
nus the logarithm of the amount by which the posterior is 
compressed inside the prior by the data. We compute it 
from the posterior samples generated using nested sam- 
pling [17| . and tabulate the values. H gives some indica- 
tion of how many parameters a data set can support, as 
usually H « N log(signal/noise) where N is the number 
of parameters [14j . If H changes significantly when new 
parameters are added, then that implies that the data 
have the potential to constrain the additional parame- 
ters effectively, and therefore have something conclusive 
to say about the distinction between the two models via 
the evidence. H is similar to the effective complexity of 
a model, as discussed in Ref. [Hj]. By definition, H de- 
pends on the prior, and the higher H is, the better the 
posterior is confined with respect to the prior. 

C. Best- fit x 2 

The best-fit \ 2 obtained for each data set is listed in 
Table [II mainly for reference only. They were obtained 
from the highest-likelihood point found by the nested 
sampling algorithm. This will be close to, though not 
precisely at, the maximum, because the stopping crite- 



rion for the nested sampling algorithm has to do with the 
convergence of the integral that estimates the evidence; 
the algorithm is not directly searching for the maximum- 
likelihood point. A naive model selection test, formalized 
as the likelihood ratio test, compares the difference in 
these best-fit values to the difference in number of model 
parameters. This does not however have a probabilistic 
interpretation, as the probability of the model is a prop- 
erty of its entire parameter range, not simply its best-fit 
values 12]. It ignores parameter priors and correlations. 

Nevertheless, the lack of any significant improvement 
in Xmin when going from the constant w models to the 
wo~w a models could be used to conclude that the dataset 
is not interested in going to the two-parameter model. 



D. Posterior parameter distributions 

Parameter constraints for each of the models, obtained 
as described in Ref. [TtJ from the same samples that were 
used to compute the evidence, are tabulated in the final 
column of Table |TJ Likelihood contours for the dark en- 
ergy parameters in Models IV and V are shown in Fig. [TJ 
the contours in Model V being significantly cut off by the 
prior. ID marginalized parameter constraints are shown 
in Fig. H 



V. DISCUSSION: THE PRESENT PICTURE 

The above results have quantified the impact of cur- 
rent data in constraining the models we have selected for 
investigation. There are considerable modelling uncer- 
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FIG. 2: Marginalized posterior parameter distributions in Models II, III, IV, and V, using the WMAP+SDSS(BAO)+Astier05 
data combination. 



tainties, both in the choice of parameter priors for each 
model, and in assigning prior model probabilities. For the 
latter, we have chosen to take them as equal, but anyone 
who thinks otherwise can readily account for it; regard- 
less of what someone thinks about two models before 
looking at the data (the prior model probabilities), the 
evidence unambiguously states how that view is changed 
by the data. For the parameter priors, we have analyzed 
two choices for each model dimensionality to investigate 
the extent of the dependence. 

In analyzing data in a situation where the correct 
choice of model is unknown, these uncertainties are un- 
avoidable, but one can nevertheless use the Bayesian 
framework to draw conclusions. Within the model se- 
lection viewpoint, one should first ask of the status of 
the various models under discussion, and only then move 
on to consider parameter constraints. 



A. Models 

In comparing the model evidences, there are different 
ways to proceed. First, we can consider all five models as 
independent, so that converting the A hiE into posterior 



model probabilities, assuming equal prior model proba- 
bilities, gives 63%, 17%, 10%, 9%, and 1% for the five 
models respectively. Consequently, while we cannot say 
that any of the models is decisively ruled out, the balance 
of probability is currently tilted significantly in favour of 
ACDM, and the two-parameter equation of state model 
fares the worst. 

Alternatively, we can consider each parametrization as 
representing a model, and within each parametrization 
marginalize over the different choices of prior that we 
considered plausible. In this approach we average the 
evidences (not their logarithms, as it is the evidences 
themselves which represent the model probability) to 
obtain A\nE = —1.5 for the constant w model and 
A\nE = —2.6 for the two-parameter model. The cor- 
responding probabilities are then 77%, 18%, and 5% for 
ACDM, the one-parameter, and two-parameter dark en- 
ergy models respectively. This approach gives quite sim- 
ilar results to the above, while avoiding penalizing the 
ACDM model for only having one choice of prior. How- 
ever for the remainder of the paper we will not average 
over models in this way. 

Finally, we might be interested only in a subset of the 
models; for instance, we may consider only the models 
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TABLE II: Parameter constraints from Bayesian model averaging using the WMAP+SDSS(BAO)+Astier05 data combination. 
Since the distributions of the dark energy parameters are generally nongaussian and/or asymmetric about the mean, their 68% 
and 95% marginalized limits are separately indicated. Some confidence limits for w a are precisely zero due to the delta-function 
contribution from Models I, II, and III superimposed on the extended tails from Models IV and V. 



models used 


parameter constraints 


all five models 
models I, II, & V 


On = 0.25 ± 0.03, Ho = 70.1 ± 1.0, w = -0.97t ;J£; ±£20. "« = o.o±g;g; ±o.i 
On = 0.24 ± 0.03, H = 70.1 ± 1.0, w < -0.98, -0.86, w a = 0-0±g;g; 



that do not allow w < — 1 (models I, II, and V), moti- 
vated by quintessence models. Amongst these models the 
probability is divided as 78%, 21%, and 1% respectively. 

Whichever the choice made, the overall conclusion is 
that the ACDM model is preferred by present data, but 
that there are non-negligible probabilities for the models 
of evolving dark energy. We will explore the implications 
of this for future dark energy searches in Section IVII 



B. Parameter values and Bayesian model averaging 

We now consider the implications of the model se- 
lection framework for constraints on the cosmological 
parameters. Within each model the usual parameter 
probability distribution analysis applies, as given in Sec- 
tion [IVD] However we now need to combine these to de- 
rive parameter constraints that account for model uncer- 
tainty (the uncertainty in which model is the true model) . 
The appropriate tool to carry this out is Bayesian model 
averaging, which is nicely summarized by Hoeting et 

al. m 2 

The basic idea is quite simple; rather than having a 
single probability distribution for a parameter, we in- 
stead have a superposition of its distributions in differ- 
ent models, weighted by the relative model probability. 
In some models the parameter may have a fixed value 
(e.g. w = —1 in ACDM), and then that component 
of the distribution is an appropriately-normalized delta- 
function. The set-up is analogous to quantum mechanics; 
whilst the true model is uncertain the distribution lies in 
a superposition of states, with the possibility that future 
measurements may collapse the probability into one of 
the models. 

The posterior probabilities of the models are given via 
Bayes' theorem by 



P{M k \D) = 



P(D\M k )P(M k ) 
J2 k P(D\M k )P(M k 



(5) 



2 Bayesian model averaging has only been used once previo usly 
in cosmology, in interpretting simulated galaxy cluster data |34| , 
and only very occasionally in astrophysics / geophysics |35| . A dis- 
tinct idea, closely related to the themes of this article, is Bayesian 
survey design which averages an experimental figure of merit over 
a set of possible cosmological models |36H . 



where P(D\M k ) is the evidence of model M k . Here 
P(M k ) are prior model probabilities, which we take to 
be equal across the models. Any other choice can be 
incorporated if required. 

Within a gaussian approximation, it is easy to write 
down suitable expressions for model averaging the pa- 
rameter means and variances [33[ , but it is practically as 
easy to manipulate the full distributions given by the pa- 
rameter chains. One simply takes the chains from each 
model and weights them according to the model prob- 
ability. That the elements will have noninteger weights 
is no problem (indeed CosmoNest chains, unlike those 
generated by Markov Chain Monte Carlo, already have 
noninteger weights with the weights within each chain 
summing to unity). All the chains can then be analyzed 
together by the usual means such as the getdist package 
of CosmoMC H3. 

One shouldn't overstate the usefulness of this method, 
as the details depend on a lot of prior information: the 
precise choice of models, including their prior parameter 
ranges, to be averaged, and also the prior model proba- 
bilities. Nevertheless there are some general qualitative 
lessons to be learned. 

The most important such lesson is that if one is seek- 
ing to limit a parameter around some special fiducial 
value, eg w = — 1 for the equation of state, then the pa- 
rameter errors are typically going to be overestimated if 
one ignores model uncertainty. The reason is that in the 
absence of a detection, a substantial part of the model 
probability is always going to be placed in the embedded 
model (in this case ACDM), which adds a delta- function 
to the probability distribution and hence suppresses the 
tails where the limits will be imposed. 

The parameter constraints obtained from Bayesian 
model averaging the models together are summarized in 
Table |TTJ Because of the presence of delta-functions in 
the averaged distribution, in some cases confidence lim- 
its can be precisely zero. The posterior distributions for 
the parameters are shown in Fig. [31 the left set of panels 
showing averaging over all five models, and the right set 
averaging the A model with the quintessence-type models 
(II and V). 

The probability distributions of the parameters de- 
rived from current data, after taking into account model 
uncertainty by Bayesian model averaging over the mod- 
els allowed by the data, summarize our current state of 
knowledge regarding these parameters. In this case we 



BMA: all 5 models BMA: models I, II & V 







0.2 0.4 
w 



FIG. 3: Posterior parameter distributions obtained using the WMAP+SDSS(BAO)+Astier05 data combination from Bayesian 
model averaging (BMA). The left set of four panels averages over all the five models under consideration, and the right over 
the quintessence-type models (I, II and V) alone. Some smoothing of the delta-functions has been carried out by binning. 



can see that even though many models are still allowed 
by the data, given the weight of the ACDM model, the 
constraints have tightened significantly around wq = — 1 
and w a = 0. For instance, compare the model-averaged 
constraints of Fig. [3] (left) with the Model IV constraints 
in Fig. H 

Note that Bayesian model averaging is an intrinsic part 
of the model selection framework, not an optional extra. 
As soon as one concedes that there might be different 
model descriptions of data to which probabilities should 
be assigned, consistent inference will then require those 
probabilities to be properly accounted in deriving pa- 
rameter probability distributions. This is necessary too 
for consistent model selection forecasting, as we now de- 
scribe. 



VI. DISCUSSION: IMPLICATIONS FOR 
FUTURE SURVEYS 

We now consider the implications of these results for 
future surveys, keeping this final discussion qualitative. 
In keeping with the previous section, an analysis of future 
data should first assess the validity of the various models 
being considered, and only then move on to parameter 
estimation. The model comparison may be decisive, leav- 
ing only one model on the table (which might be either 
ACDM or one of the evolving models), or it may still 
leave several viable models. If only one model survives 
then standard parameter estimation tools will be valid, 
otherwise model averaging should again be deployed to 
study parameter distributions. 



The main aim of forecasting the power of future sur- 
veys is to enable informed choices as to which projects 
to fund. The Dark Energy Task Force (DETF) recently 
produced an influential report [4] quantifying the capabil- 
ities of a wide range of proposed experiments to constrain 
dark energy. Following ideas from Ref . [38[ , they defined 
a Figure-of-Merit (FoM) as the inverse of the area inside 
the 95% contour in the wo-w a plane, for a fiducial ACDM 
model. Normalizing to present knowledge, this factor is 
typically a few to a few tens for proposed experiments of 
increasing sophistication. 3 

The DETF FoM presumes that the two-parameter 
dark energy model is the true one (i.e. that wo and w a 
are parameters to be varied in fitting the data), and 
quantifies the extent to which future experiments will 
compress the allowed parameter range about the point 
Wq = —1 and w a = 0. What it does not do is allow 
for the possibility that the two-parameter model is not 
correct. To quote from the abstract of Ref. [33[, "Data 
analysts typically select a model from some class of mod- 
els and then proceed as if the selected model had generated 
the data. This approach ignores the uncertainty in model 
selection, leading to inferences that are more risky than 
one thinks they are. " One way to avoid this problem 
is to employ model selection forecasting, as described in 
Ref. [8|, which proposed a FoM based on the parameter 
area in which ACDM cannot be strongly excluded using 



These ideas have also been extended beyond survey comparison 
to the issue of survey design by Bassett and collaborators [36{ . 
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the Bayesian evidence. 

We stress that the DETF FoM is a perfectly good way 
of distinguishing the capabilities of different experiments, 
even though it is a parameter estimation tool and those 
experiments are primarily seeking to answer model selec- 
tion questions. It is entirely reasonable to believe that 
an experiment which is better at estimating parameters 
within a model will also be better at model selection of 
that model against embedded models. Our aim here is to 
advise caution against over-interpretting the DETF FoM, 
in terms of the probability that an upcoming experiment 
will actually detect dark energy evolution. 

The model selection considerations we have outlined 
have three important implications in interpretting the 
DETF FoM. 

1 . The chances of detecting dark energy evolution are 
much less than implied by the fractional shrinkage 
of parameter area. For example, if the FoM says 
the area in the WQ-w a plane will shrink by a factor 
10, this does not mean a 90% chance that evolu- 
tion will be detected. This comment matches most 
people's intuition, but is quantified by the realiza- 
tion in model selection that a substantial part of 
the probability lies in the ACDM model. If this 
model is true, then obviously evolution cannot be 
detected as that would rule out the true model. 
Since present knowledge puts most of the probabil- 
ity in ACDM, as shown above, we can immediately 
conclude that the current chances of even an arbi- 
trarily good experiment detecting dark energy evo- 
lution are less than half (with the significant caveat 
of the various model and parameter priors we have 
assumed) . 

2. There is a substantial probability that ACDM is the 
correct model, but the DETF FoM does not quan- 
tify how well experiments will determine this. If 
ACDM is the true model, then the outcome of fu- 
ture experiments will be to support that model. 
This too would be a highly- valuable outcome. In 
this case, there is another model selection based 
FoM, described in Ref. 8], which evaluates the 
strength with which an upcoming experiment is ex- 
pected to deliver a model selection verdict in favour 
of ACDM under the assumption that that model 
is correct. Model selection approaches have the 
crucial property that, unlike parameter estimation 
methods, they can accrue positive support for the 
simpler model. As shown in Ref. [H, advanced ex- 
periments are capable of decisively ruling out the 
two-parameter models in favour of ACDM (see also 
Ref. [39]). This then is the answer to the often- 
asked question, how far do we have to tighten con- 
straints on dark energy parameters before we can 
start to believe that ACDM is the true model. This 
question is often asked with parameter estimation 
forecasting techniques in mind, but the answer lies 
in model selection. A design goal of future experi- 



ments should be that they are able to give a decisive 
verdict for ACDM if it is the true model. 

3. If evolution is neither detected nor decisively ex- 
cluded, the DETF FoM will overestimate the pa- 
rameter errors. It overestimates because it does 
not incorporate Bayesian model averaging. A pow- 
erful experiment that fails to detect evolution is 
bound to push most of the model probability into 
the ACDM model, so that the eventual combined 
parameter chain includes only a small fraction of el- 
ements from the wo-w a model. That is to say, the 
delta- function of probability at w = — 1 will contain 
most of the posterior distribution. So an experi- 
ment which does not detect evolution will impose 
more powerful constraints than the FoM indicates. 

None of the above affects the validity of the DETF 
FoM as a tool for quantifying the capabilities of differ- 
ent experiments, though one should bear in mind that 
it may prove inadequate if the true model is more com- 
plicated than the WQ-w a model Nevertheless, while 
the DETF FoM may correctly rank experiments relative 
to one another, since the principal goal of dark energy 
experiments is one of model selection, we would advo- 
cate where possible also analyzing their capabilities using 
model selection forecasting tools as described in Ref. 8] 
and this paper. 

The approach we have outlined incorporates, modifies 
and extends the Expected Posterior Odds (ExPO) tech- 
nique pioneered by Trotta This approach splits the 
model parameter space into regions where different model 
selection verdicts are expected, and then averages these 
over the current distribution in parameter space to obtain 
a probability of each outcome. Of course, only by actu- 
ally doing the experiment do you discover which outcome 
does arise. Trotta did not however fully implement the 
model selection/Bayesian model averaging framework, as 
he computed the present probability distribution within 
one model only, whereas in Ref. [41[ multiple models were 
included in an ExPO-type forecast. Ref. [1| extended 
ExPO to delineate parameter space regions where differ- 
ent model selection outcomes would be expected, and to 
define model selection figures of merit. The present paper 
further extends the framework to estimation of parame- 
ter uncertainties via Bayesian model averaging as well as 
calculation of model probabilities. 

VII. CONCLUSIONS 

We have carried out a model selection analysis of dark 
energy models, updating and expanding on an earlier 
analysis by Saini et al. [9j. We find, as did they, that 
the preferred model is the ACDM model, and indeed we 
find that the two-parameter WQ-w a model is quite signif- 
icantly disfavoured already by present data. 

We have made a first use of the concept of Bayesian 
model averaging [33j to obtain current cosmological pa- 
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rameter uncertainties. Bayesian model averaging gener- 
alizes the usual Bayesian parameter estimation methods 
to the situation where the choice of model is uncertain, 
and in the absence of detections typically significantly 
strengthens parameter constraints. Finally, we have de- 
scribed how to use this framework to project the prob- 
abilities of different outcomes to future dark energy ex- 
periments, and in particular to interpret the meaning of 
the figure-of-merit introduced by the Dark Energy Task 
Force Q. 

We conclude that based on present knowledge the 
probability of future experiments detecting dark energy 
evolution is rather small, unless the various prior assump- 
tions of our analysis prove to be ill-founded. This is sim- 
ply because present data places the majority of the prob- 
ability in the ACDM model. On the other hand, high- 
precision experiments may be able to decisively support 



the ACDM model, this ability being measured by a model 
selection figure-of-merit given in Ref. [8] . If ACDM is not 
picked decisively, and neither is dark energy evolution de- 
tected, then they can give tighter limits on dark energy 
parameters than one would infer from the DETF figure- 
of-merit. 
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